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Abstract — Obstructive  Sleep  Apnea  (OSA)  is  a  disease 
in  which  airways  involuntarily  collapse  during  sleep,  lead¬ 
ing  to  serious  consequences.  About  10%  of  snorers  suffer 
from  OSA,  unknown  to  them,  nevertheless  requiring  med¬ 
ical  attention.  The  current  standard  of  diagnosis  for  OSA, 
polysomnography  (PSG),  requires  that  the  patients  spend 
one  full  day  in  a  hospital,  wired  to  a  multitude  of  instru¬ 
ments.  PSG  is  complicated,  expensive,  and  unsuitable  for 
mass  screening  of  the  population.  OSA  is  commonly  ac¬ 
companied  by  snoring.  Even  though  snoring  carries  vital 
information  on  the  state  of  the  airways,  it  has  rarely  been 
used  in  diagnosing  OSA.  In  this  paper,  we  present  a  math¬ 
ematical  model  for  snoring,  and  illustrate  its  usefulness  in 
diagnosing  OSA.  We  exploit  similarities  and  differences 
between  speech  and  snoring  signals  to  separate  the  two, 
and,  provide  new  features  to  diagnose  OSA  at  low  cost. 
Via  experiments  carried  out  in  a  hospital  sleep-laboratory, 
we  illustrate  the  importance  of  using  noise  reduction  tech¬ 
niques  to  acquire  snoring  data  with  sufficient  integrity. 
Keywords-  Snoring,  Apnea,  Pitch  Analysis,  vocal  tract. 

I.  INTRODUCTION 

An  OSA  attack  is  characterized  by  repeated  episodes 
of  upper  airway  closure  during  sleep,  and,  is  defined  as 
the  total  cessation  of  respiratory  airflow  that  lasts  at  least 
10s.  OSA  events  are  typically  terminated  by  a  premature 
arousal  from  sleep,  with  the  most  presenting  symptom 
being  loud  and  interrupted  snoring. 

Even  though  OSA  appears  benign  at  a  first  glance,  it 
leads  to  a  large  number  of  untimely  deaths.  Among  the 
known  problems  associated  with  OSA  are  hypertension, 
ischemic  heart  disease  and  stroke.  In  addition,  OSA  is 
responsible  for  industrial  accidents,  driving  fatalities  and 
lost  production  due  to  daytime  sleepiness  of  operators. 

The  current  standard  of  diagnosis,  PSG,  requires  that 
the  patients  sleeps  for  a  day  in  a  hospital,  under  video 
surveillance  and  wired  to  a  multitude  of  instruments.  In 
a  typical  PSG  session,  signals/parameters  such  as  ECG, 
EEG,  EMG,  EOG,  nasal/oral  airflow,  respiratory  effort, 
neck  vibrations,  body  positions,  body  movements  and 
the  blood  Oxygen  saturation  of  the  patient  are  carefully 
monitored.  The  interpretation  of  the  PSG  of  a  patient 
too  is  a  complex  process,  demanding  the  attention  of  a 
trained  expert.  The  limited  PSG  facilities  around  the 
world  has  resulted  in  long  waiting  lists,  making  it  an 
impossible  task  to  test  all  the  patients  in  need  of  such 
assessment. 

There  had  been  a  few  attempts  at  using  snoring  sounds 
to  diagnose  OSA  [1],  [2],  [3].  The  features  commonly 
used  to  characterize  snoring  sounds  were  the  sound  inten¬ 


sity  or  the  peak  frequency  of  the  snore  spectrum.  OSA  is 
primarily  caused  by  structural  abnormalities  in  the  up¬ 
per  airway  during  sleep,  and  the  features  used  did  not, 
unfortunately,  directly  correspond  to  OSA.  Furthermore, 
in  all  of  [1],  [2],  [3],  raw  snoring  data  were  used  without 
any  processing  to  reduce  noise  captured  together  with 
the  data.  Even  in  the  controlled  setting  of  the  hospital’s 
sleep  clinic,  the  recorded  snoring  sounds  are  usually  cor¬ 
rupted  with  background  noise,  leading  to  inconsistencies 
in  analysis.  One  of  the  serious  problems  hindering  snore 
analysis  is  the  unavailability  of  methods  to  automatically 
separate  genuine  snoring  sounds  from  other  biological 
sounds  such  as  somniloquous  speech. 

This  paper  addresses  those  concerns  and  make  the  fol¬ 
lowing  contributions: 

.  We  show  the  importance  of  acquiring  snoring  signals 
at  a  sufficient  Signal-to-Noise  ratio  {SNR),  if  reasonable 
results  are  to  be  expected.  We  present  methods  to  en¬ 
hance  the  SNR  of  recordings. 

.  We  propose  a  mathematical  model  for  snoring,  and  il¬ 
lustrate  its  usefulness  in  separating  genuine  snoring  from 
other  biological  sounds  such  as  somniloquous  speech. 

•  We  show  that  the  proposed  mathematical  method  al¬ 
lows  us  to  devise  novel  signatures  to  diagnose  OSA  ef¬ 
fectively. 

II.  A  model  for  the  snoring 

We  model  the  discretized  sound  y[n\  recorded  simul¬ 
taneously  with  a  PSG  session  as, 

y[n]  =  ss[n]  +  sp[n]  +  6[n],  (1) 

=  s[n]+b[n\  (2) 

where  ss[n]  is  the  clean  snoring  sound,  sp  [n]  is  the  som¬ 
niloquous  speech  of  the  patient,  or  speech  from  exter¬ 
nal  sources,  and,  b[n]  is  the  background  electrical  and 
acoustical  noise.  The  quantity  s [n]  =  ss[n]  +  sp[n]  now 
combines  clean  snoring  and  somniloquous  speech. 

The  component  sp[n ]  can  be  described  using  the 
SOUrce-VOCal  tract  model  in  speech  synthesis  [4],  i.e., 

sp[n)  =  hp{n\  *  gp[n],  (3) 

where  hp[n\  represents  the  transmission  characteristics 
of  the  vocal  tract,  and  gp[n\  is  the  source  excitation  ini¬ 
tiating  the  speech  sound.  The  symbol  “*”  stands  for  the 
linear  convolution  operator.  For  voiced  sounds  such  as 
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Fig.  1.  Speech  vs  snoring:  similarities  and  differences. 


vowels,  the  excitation  can  be  represented  by  a  sequence 
of  impulses  given  by, 


OO 

gp  [n]  =  K  S[n  -  kTp] ,  (4) 

—  OO 

where  Tp  denotes  the  pitch  associated  with  the  particular 
sound,  K  is  a  scaling  constant  and  £[•]  is  the  discrete 
delta  function.  In  the  case  of  unvoiced  sounds,  gP[n]  is 
a  random  noise  process.  The  SOUrce-VOCal  tract  model 
is  well  developed  and  is  used  widely  in  speech  synthesis 
systems. 

In  this  paper,  we  draw  on  the  similarities  of  the  speech 
and  snore  production  mechanism  to  model  the  genuine 
snore  component,  i.e. ,  ss[n]  in  (1).  Both  snoring  and 
speech  sounds  share  some  common  ’’hardware”  in  the 
in  the  process  of  generation;  both  are  modulated  by  the 
vocal  tract,  or,  the  acoustical  properties  of  the  upper 
airways. 

Some  similarities  as  well  as  differences  between  snor¬ 
ing  and  speech  are  shown  in  Fig.l.  In  Fig.  1  (a),  a  snor¬ 
ing  sample  recorded  from  an  OSA  patient  is  illustrated. 
Fig.  1(b)  and  1(c)  shows  parts  of  the  trace  in  Fig.  1(a) 
zoomed  into  two  different  scales,  for  easy  visualization  of 
details.  Similar  plots  for  a  speech  signal  (corresponding 
to  the  sound  “ae”  in  the  utterance  “one”)  are  shown  in 
Figs.  1(d), (e)  and  (f).  The  pseudo-periodic  nature  of  the 
snoring  signal  is  clearly  evident  in  Fig.  1.  This  is  similar 
to  the  case  of  voiced-sounds  in  human  speech,  as  seen  in 
Fig.  1(d)  (e)  and  (f). 

The  “source-excitation”  for  snoring  can  be  considered 
to  be  pseudo-periodic  in  nature,  having,  possibly,  its  ori¬ 
gins  in  the  vibrations  of  the  structures  of  the  upper  air¬ 


ways.  Drawing  from  the  speech  model,  we  describe  ss[n] 
as, 

ss[n]  =  hs[n\  *  gs[n],  (5) 

where  gs  is  a  source  excitation  sequence  and  hs  is  termed 
the  T otal  A  i  rway  Response  (TAR).  The  TAR  is  a  slowly 
time  varying  function,  which  captures  the  time-varying 
acoustical  features  of  the  airways.  The  quantity  gs[n ]  is 
a  pseudo-periodic  sequence  given  by, 

OO 

gs[n]  =  U[n]  ^  S[n  -  kTs  +  e],  (6) 

k=  —  oo 

where  e  is  zero-mean  random  variable  satisfying 
Probab(|e|  >  T)  =  0,  and,  U[n]  captures  the  slowly 
varying  magnitudes  of  the  excitation  sequence;  Ts  is  the 
(pseudo)  periodicity  associated  with  snoring,  which  is  a 
measure  of  the  "pitch”  of  snoring. 

In  this  paper,  we  investigate  the  nature  of  the  “pitch” 
of  speech  and  snoring.  Working  on  clinical  data,  we  illus¬ 
trate  that  via  Eqs.(3)-(6),  we  can  separate  genuine  snor¬ 
ing  from  somniloquous  speech,  and,  more  importantly, 
diagnose  OSA  consistently. 

III.  Data  acquisit  ion,  A  nnotat  ion  & 
enhancement 

A.  Data  acquisition 

The  environment  of  a  Sleep  Laboratory  is  highly  con¬ 
trolled  in  order  to  provide  the  best  ambience  for  the  pa¬ 
tient  to  sleep.  However,  even  in  that  environment,  the 
component  b[n]  can  drive  the  SNR  of  the  recording  to  an 
unacceptably  low  value.  One  of  the  major  reasons  is  that 
the  component  ss  [n]  has  a  dynamic  range  >  90 dB.  Softer 
snoring  can  easily  get  buried  in  the  background  noise 
b[n).  In  the  work  of  this  paper,  SNR  enhancement  was 
attempted  though  careful  hardware  design,  and  through 
software  based  noise  reduction  algorithms. 

We  developed  a  high  fidelity  snore  acquisition  sys¬ 
tem  for  the  sleep  laboratory.  Two  microphones  are 
used  for  recordings,  one  placed  about  50cm  above  the 
head  of  the  patient,  while  the  other  placed  near  the  air- 
conditioner  (see  Fig. 2).  The  microphones  (40-18000  Hz, 
Dynamic  range  118  dB,  Model  BG4.1,  Shures  Broth¬ 
ers  Incorporated,  Evanston,  Illinois)  are  connected  to  a 
signal-conditioning  unit  (INA103,  Burr-Brown  Corpora¬ 
tion,  Tucson,  Arizona),  output  of  which  is  connected  to 
the  data  acquisition  (DAQ)  card  (NI4552,  National  In¬ 
struments,  Austin,  Texas)  through  a  shielded  coaxial  ca¬ 
ble  of  12m.  The  sounds  are  digitized  at  a  rate  of  44.1 
kSamples/s,  with  a  16-bit  resolution. 

In  order  to  achieve  a  high  SNR,  we  chose  low-noise 
components  in  the  design.  Also,  we  used  a  shielded  and 
grounded  co-axial  cable  to  carry  the  signal  from  the  ex¬ 
amination  room  to  the  DAQ  in  the  monitoring  station. 
Shielding  proved  to  be  an  essential  strategy,  in  counter¬ 
ing  the  electromagnetic  interference  (EMI)  in  the  envi¬ 
ronment. 
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Fig.  2.  Snore  data  acquisition  simultaneously  with  routine  PSG 
assessment. 

B.  Data  Annotation 

Based  on  routine  PSG  assessments,  snoring  sounds 
were  annotated  as  either  Benign-Snoring  ( BS )  or  OSA- 
snoring  (AS1),  with  the  help  of  clinical  specialists.  We 
studied  14  subjects  undergoing  PSG  assessment  at  the 
sleep  clinic,  of  which  7  each  were  diagnosed  as  belonging 
to  classes  AS  and  BS.  About  10  episodes  of  snoring  were 
randomly  selected  from  each  of  the  subjects  for  analysis. 
In  patients  with  OSA,  snoring  corresponded  to  the  1st, 
2nd  and  3rd  breath  after  an  episode  of  OSA  attack. 

C.  Noise  reduction 

We  used  the  spectral  subtraction  (SS)  method  [5]  to 
reduce  the  background  noise  component  b[n\.  The  SS 
method  was  chosen  for  its  simplicity,  and  the  proven 
ability  to  deliver  high  SNR  ratios  in  speech  processing 
applications  in  noisy  environments. 

The  primary  microphone,  i.e.  the  microphone  placed 
just  above  the  patient’s  head,  will  capture  the  signal  y[n\. 
We  placed  another  microphone  closer  to  the  air  condi¬ 
tioning  vent  to  function  as  a  reference  microphone,  which 
predominantly  captures  the  soft  purring  sounds  from  the 
air  conditioner  [6].  The  reference  microphone  output 
is  a  reasonable  representation  of  the  component  b[n], 
which  represents  both  electrical  and  acoustical  back¬ 
ground  noise. 

Taking  the  Fourier  Transform  of  (2),  we  get,  Y(f )  = 
S(f)  +  B(f),  where  Y(f),  S(f)  and  B{f)  respectively 
denotes  the  Fourier  transforms  of  y[n],  s[n]  and  b\n\. 

Then  the  spectral  subtraction  can  be  expressed  by: 

\S(f)\b  =  \Y(f)\b-aWIW  (7) 

where  |5(/)|  denotes  the  estimate  of  |5(/)|  and  \B(f)\b  is 
a  measure  of  the  time-averaged  noise  spectra.  Exponent 
b  is  set  to  1  for  magnitude  spectral  subtraction  and  2 
for  power  spectral  subtraction,  a  is  the  over-subtraction 
factor  and  controls  the  amount  of  noise  subtracted  from 
the  noisy  signal.  To  prevent  negative  estimate,  the  spec¬ 
tral  subtraction  magnitude  output  is  further  processed 


as: 

icmi_/  \S{f)\  ,  for  |<S(/)|  >  (3\Y(f)\  (  ] 

1  (/)l  \  P\Y(f)\  ,  else  (8) 

where  the  parameter  (3  determines  the  remaining  noise 
floor.  The  phase  spectrum  of  the  de  noised  snoring  signal 
is  approximated  [5]  to  be  the  same  as  the  phase  spectrum 
of  the  noisy  snoring  signal  (f>y(f ).  The  spectrum  of  the 
de  noised  snoring  signal  can  then  be  obtained  from: 

S(f)  =  |5(/)|e^(/>  (9) 

D.  The  Detection  of  Pitch 

The  noise  suppressed  observation  y[n]  is  used  to  es¬ 
timate  the  pitch  associated  with  snoring,  based  on  a 
modified  version  of  the  well  known  cepstrum-based  pitch 
detector  (Fig.  3).  First  we  computed  the  envelope 
a[n ]  of  y[n],  and  windowed  a[n\  with  w[n]  to  obtain 
aw[n\  =  a[n]io[n].  Then  the  complex  cepstrum  aw[k] 
of  aw  [n]  was  computed  according  to, 

aw[k\  =  F_1{log(F(au,[n]))},  (10) 

where  F  and  F_1  respectively  denote  the  discrete  Fourier 
Transform  and  its  inverse.  The  periodicity  in  y[n]  will 
appear  as  a  peak  in  aw  [k] ,  with  the  location  of  the  peak 
corresponding  to  the  period. 


Fig.  3.  The  modified  ceptrum-based  pitch  detector 


I V  .  R  esu 1 1  s 

In  Fig. 4(a)  we  show  a  3-second  episode  of  a  typical 
snoring  data  recorded  from  a  patient.  Sound  playback 
of  the  record  confirmed  that  the  background  noise  pro¬ 
foundly  deteriorated  the  quality  of  the  recording.  We 
used  the  SS  techniques  with  b  =  1,  a  =  1  and  b  =  0.05, 
to  obtain  the  noise  suppressed  data  shown  in  Fig. 4(b). 
Playback  indicated  that  the  SS  technique  had  success¬ 
fully  suppressed  the  background  noise.  In  general,  im¬ 
provement  of  SNR  in  the  range  of  6  —  8dB  could  be 
achieved,  where  the  SNR  is  defined  as  a  segmental  SNR 
[4] ,  as  used  in  the  context  of  traditional  speech  analysis. 

Fig. 4(c)  and  4(d)  illustrate  the  importance  of  noise 
reduction  in  pitch  detection.  In  both  frames  4(c)  and 
4(d),  S(n)  represents  shorter  data  traces  (about  60  ms 
duration,  approx.)  extracted  from  the  traces  shown  in 
Fig4(a)  and  4(b)  respectively.  In  Fig.  4  (c),  the  raw  data 
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Fig.  4.  The  importance  of  background  noise  suppression. 


together  with  its  envelope  a[n]  and  the  cepstra  a[n\  are 
shown.  In  Fig. 4(d),  similar  figures  for  noise-suppressed 
data  are  shown.  Only  the  de-noised  signal  shows  the 
periodicity  clearly,  as  also  evidenced  by  the  well  defined 
cepstral  peak  around  30ms.  Comparing  Figs. 4(c)  with 
4(d),  we  conclude  that  noise  suppression  has  an  impor¬ 
tant  role  in  the  snore  analysis  regime.  The  importance 
of  noise  removal  takes  added  significance,  if  the  snore 
testing  is  to  be  done  in  a  home  setting,  away  from  the 
carefully  controlled  environment  of  the  hospital  sleep  lab¬ 
oratory. 

In  pitch  analysis  of  speech,  the  length  of  the  data  win¬ 
dow  w[n]  is  taken  around  40ms.  In  snoring  analysis,  how¬ 
ever,  this  is  generally  insufficient,  because  not  enough 
repetitive  structures  ( TAR  waves)  fall  within  40ms  for 
the  periodicity  to  be  well  detected.  Thus,  all  the  results 
reported  in  this  paper  uses  a  Hamming  W  indow  of  length 
80ms,  which  proved  to  be  of  sufficient  length. 


evaluated  according  to  the  following  procedure: 

Step  1:  Remove  the  noise  from  y[n ]  using  the  SS  tech¬ 
nique  to  obtain  y[n). 

Step  2:  Define  a  sliding  Hamming  Window  w[n]  of 
length  80ms,  with  an  overlap  factor  50%.  Using  the  cep¬ 
stral  technique,  estimate  the  periods  T)  for  each  segment 
i,i  =  1,2, ..  .30  of  a  given  snoring/speech  episode  j  as 
the  Hamming  window  slides  over. 

Step  3:  Calculate  the  mean  T ^  and  the  standard  de¬ 
viation  <7^  for  the  snoring  episode  j,  based  on  the 
30-estimations  T),  i  =  1,2,..., 30.  Form  the  pair 
(cr<j),T(:,))  for  episode  j. 

Step  4:  Calculate  (cr^),  T^)  for  all  episodes  of  snor¬ 
ing/speech  data  belonging  to  a  given  class  AS,  BS  or 

Speech. 

In  Fig. 5,  pairs  of  data  (cr1-7* ,  )  are  plotted  on  a 

[o', T] -plane  ('pitch-jitter'  graph),  where  symbols  ‘+’, 
and  ‘o’  respectively  denote  classes  AS,  BS  and  Speech 
respectively.  According  to  Fig.  5,  the  snoring  signals  can 
be  successfully  separated  into  AS1  and  BS  classes  using 
the  feature  [T,  er] ,  based  on  a  linear  decision  boundary 
T  =  1.85cr  +  10.0.  This  boundary  separates  the  given 
data  into  AS  class  with  92.31%  accuracy  and  BS  with 
90.7%  accuracy.  The  separation  of  Speech  from  the  rest 
of  the  data  was  100%  successful. 

V.  CONCLUSION 

We  developed  a  mathematical  model  for  snoring, in  the 
form  of  a  linear  convolution  between  pseudo-periodic  ex¬ 
citation  sequence  and  a  quantity  (TAR)  representing  the 
acoustic-mechanical  properties  of  the  upper-airway.  We 
proposed  the  use  of  pseudo-periodicity  as  a  signature  for 
OSA.  Snoring  can  easily  be  discriminated  from  human 
speech  based  on  the  proposed  signature.  Furthermore, 
the  pseudo-periodicity  itself  provides  a  promising  feature 
to  diagnose  OSA.  Noise  reduction  schemes  are  important 
to  obtain  good  results  in  snore  analysis. 


Fig.  5.  The  pitch-jitter  graph  for  snoring  and  speech.  Symbols 
‘-I-’,  and  ‘o’  denote  classes  AS,  BS  and  Speech  respectively. 

All  the  14  patients  in  the  database  were  systematically 
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